Orthographic and Morphological Correspondences between Related Slavic Languages as a Base for Modeling of Mutual Intelligibility

نویسندگان

  • Andrea K. Fischer
  • Klara Jagrova
  • Irina Stenger
  • Tania Avgustinova
  • Dietrich Klakow
  • Roland Marti
چکیده

In an intercomprehension scenario, typically a native speaker of language L1 is confronted with output from an unknown, but related language L2. In this setting, the degree to which the receiver recognizes the unfamiliar words greatly determines communicative success. Despite exhibiting great string-level differences, cognates may be recognized very successfully if the receiver is aware of regular correspondences which allow to transform the unknown word into its familiar form. Modeling L1-L2 intercomprehension then requires the identification of all the regular correspondences between languages L1 and L2. We here present a set of linguistic orthographic correspondences manually compiled from comparative linguistics literature along with a set of statistically-inferred suggestions for correspondence rules. In order to do statistical inference, we followed the Minimum Description Length principle, which proposes to choose those rules which are most effective at describing the data. Our statistical model was able to reproduce most of our linguistic correspondences (88.5% for Czech-Polish and 75.7% for Bulgarian-Russian) and furthermore allowed to easily identify many more non-trivial correspondences which also cover aspects of morphology.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The empirical basis of Slavic intercomprehension

The possibility of intercomprehension between related languages is a generally accepted fact suggesting that mutual intelligibility is systematic. Of particular interest are the Slavic languages, which are “sufficiently similar and sufficiently different to provide an attractive research laboratory” (Corbett 1998). They exhibit practically all typologically attested means of encoding grammatica...

متن کامل

"Reading Polish with Czech Eyes" or "How Russian Can a Bulgarian Text Be?": Orthographic Differences as an Experimental Variable in Reading Comprehension

The human language processing mechanism shows a remarkable robustness to different kinds of imperfect linguistic signals. However, it is unclear how exactly a message encoded in one system is decoded by persons used to a different system. We are interested in gaining insights about human performance at retrieving information encoded in an unfamiliar encoding system. Our focus lies on reading in...

متن کامل

Lexical and orthographic distances between Germanic, Romance and Slavic languages and their relationship to geographic distance

When reading texts of different but closely related languages, intelligibility is determined among others by the number of words which are cognates of words in the reader’s language, and orthographic differences. Orthographic differences partly reflect pronunciation differences and therefore are partly a linguistic level. Dialectometric studies in particular showed that different linguistic lev...

متن کامل

A Knowledge-Rich Approach to Measuring the Similarity between Bulgarian and Russian Words

We propose a novel knowledge-rich approach to measuring the similarity between a pair of words. The algorithm is tailored to Bulgarian and Russian and takes into account the orthographic and the phonetic correspondences between the two Slavic languages: it combines lemmatization, hand-crafted transformation rules, and weighted Levenshtein distance. The experimental results show an 11-pt interpo...

متن کامل

Modeling Intelligibility of Written Germanic Languages: Do We Need to Distinguish between Orthographic Stem and Affix Variation?

We measured orthographic differences between five Germanic languages. First, we tested the hypothesis that orthographic stem variation among languages does not correlate with orthographic variation in inflectional affixes. We found this hypothesis true when considering the aggregated stem and affix distances between the languages. We also correlated the stem and affix distances of the cognate p...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016